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Abstract As part of the Remote Exploration and 
Experimentation Project (REE), work was performed 
to do a proton SEE evaluation of the Myricom network 
protocol system (Myrinet). This testing included the 
evaluation of the Myrinet crossbar switch and the 
Network Interface Card (NIC). To this end, two 
crossbar switch devices and five components in the NIC 
were exposed to the proton beam at the University of 
California at Davis Crocker Nuclear Laboratory 
(CNL). 

I. Introduction 

The Remote Exploration and Experimentation 
Project (REE) was part of the NASA's High 
Performance Computing and Communications 
Program. An effort was in place to place a 
commercial-off-the-shelf (COTS) supercomputer in 
space. The architecture being investigated was a 
multi-processor system connected to a prime 
controller that was hardened for space. The network 
system that was being evaluated for use in this 
application was Myrinet. 

For this system to be useful in the space 
environment, the network electronics should not 
place undue radiation susceptibility in the overall 
system architecture. To evaluate the Myrinet system 
for space use, crossbar switches and network 
interface cards were exposed, at room temperature, to 
the proton environment at the University of 
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California at Davis Crocker Nuclear Laboratory (63 
MeV beam). 

The following sections will describe the details of 
the devices that were tested, the hardware systems 
used to evaluate these devices, the software used, and 
the results of the testing. 

II. Devices Tested 

Two devices were tested for this work. The first 
were crossbar switches manufactured by Myricom, 
Inc, which provide the interconnectivity in the 
Myrinet model (as a hub would in star-configuration 
network model). Secondly, a Peripheral Component 
Interconnect (PCI)-bus network interface card (NIC) 
manufactured by Myricom was tested. On the NIC, 
five devices were arbitrarily chosen to be exposed 
and the system evaluated for its response to their 
exposure. The listing of all devices used in this 
testing is given in Table I below. Official description 
of the Myrinet standard appears in its entirety in an 
ANSI document [1]. 

A. 16 port Crossbar 

The crossbar switch (Xbar) device type is the 
essential component interconnecting devices residing 
on the network. The 16-port Xbar switch tested 
allows 16 devices to connect in any configuration, 
one with another (no broadcast or group 
connections). The bandwidth of Myrinet is described 
as the data rate available in the “forward” direction 
plus the bandwidth available in the opposite 
direction. 

There are different Myrinet speed standards. The 
Xbar and the NIC types tested are capable of 
operating at the Myrinet-2000 data rate of 2 gigabits 
per second (GBPS) in both directions simultaneously 
(full duplex). Thus, the data rate is expressed as 2000 
+ 2000. Each single-direction 2000 MBPS link is 
referred to as a channel. The opposite-direction pair 
of these channels is referred to as a link. 
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TABLE I 

Device Under Test (PUT) Table 


Device 

Vendor 

Location 

Model Number 

Switch 1 

Myricom 

Switch 
Board 1 

M3-SW16-8S 

Switch 2 

Myricom 

Switch 
Board 2 

M3-SW16-8S 

NIC 

Myricom 

NIC 

M3S-PCI64B-2 

Lanai9 

Myricom 

NIC 

r 9.1 

SerDeSer 

Myricom 

NIC 

i.i 

PCIDMA 

Myricom 

NIC 

1.3 

Transceiver 

Vitesse 


VCS7146RH 

SRAM 

Samsung 


K7N803601M 


The Xbar Device Under Test (DUT) is a 0.25 pm 
commercial CMOS ASIC manufactured for Myricom 
to perform their function. Each DUT acquired was 
pre-mounted to a printed circuit board (See Figure 1). 
No other components on the DUT board were 
irradiated. 

Each Xbar Integrated Circuit (IC), shown in Figure 
2, has 16 System Area Network (SAN) ports. The 
Xbar IC and SAN specification for Myrinet is 
described in [2]. Briefly, SAN is a parallel data and 
control signals format for short haul (components no 
more distant than within one rack). Eight of these 
SAN links are brought to the front panel through a 
serializer/deserializer for connection to external 
components. The other eight ports are connected to 
the backplane connector for SAN connection to other 
components within the chassis that hold these cards. 

Providing 16 serial ports requires more than one 
Xbar card; the other eight backplane ports must be 
made available at the front panel. This is 
accomplished with a “spline” card, which does not 
contain an Xbar but merely converts the eight SAN 
channels to/from serial format. An inefficient 
arrangement of two Xbar cards can be used in place 
of one Xbar and one spline. A single Xbar card can 
be used if the eight backplane ports are not required. 
Some of this testing involved a single Xbar card and 
the rest involved two Xbar cards because the proton 
irradiation was sufficiently penetrating to hit both 
Xbar Ics, which are one above the other. This gave 
data on many more paths (described below in the 
DUT System description). 

Messages are transported across a Myrinet as one 
or more packets. Packets are encoded with routing 
information that allows it to reach the desired 
destination. Each pass through an Xbar (in a large 
network many Xbar transits may be required to reach 
the destination) involves one byte of routing 
information, which gives a relative (to the incoming 


port) output port. These routing bytes are removed as 
they are used, and Cyclic Redundancy Check (CRC) 
is recalculated and appended so that the new packet 
(1 byte shorter) is correctly formatted. Packets that 
have inconsistent CRCs are simply dropped. This 
behavior is hard-wired within each component 
(within the LANai9 processor within the NIC, and 
within the Xbar IC). That is unfortunate for SEE 
testing — events are detectable only by their failure to 
arrive. No examination of erroneous data is possible. 



Figure 1. The Myrinet Xbar card showing the 
backplane SAN ports at top and serial front panel 
ports. 



Figure 2. Close-up of the Xbar IC. The total height 
above the board of the IC is 1 .4 mm. 


B. Network Interface Card 
The Network Interface Card (NIC) provides 
functionality for a device (PCI-bus computer, as are 
most desktops currently in use) to communicate via 
the Myrinet-2000 standard [3]. It is a PCI-64 form- 
factor card that can operate from 32-bit PCI bus as 
well. It operates at either 5 V dc for PCI32 operation 
or 3.3V for either PCI32 or PCI64 operation. The 
card operates at either 33 MHz or 66 MHz PCI bus 
speed with ICs that provide bus interface (including 
PCI Direct Memory Access (PCIDMA)), protocol 
processing, and serialization and deserialization 
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(SerDeser) functions. The primary focus was on the 
protocol processing IC, which is called the “LANai 9 
processor”. 

Operating in typical PC situations (33 MHz PCI32 
bus) allows a data rate of 132 MB/s. Operation at 
maximum data rate (66 MHz PC 164 bus) is still 
limited by either the host computer’s capabilities or 
the PCI bus limits. 523 MB/s is the theoretical limit; 
the actual rate during testing was near that limit. To 
achieve full 2 GBPS rate 64 bit 200 MHz bus and 
high speed processors are required. Figure 3 shows 
the NIC. The additional signals for PCI-64 operation 
can be seen hanging to the right of the white PCI-32 
socket. The black Myrinet cable can be seen at far 
left. 



Figure 3. Myrinet NIC installed in PCI-32 
motherboard without extender. 


III. Test Hardware 

The test system consists of two subsystems, the 
Test Controller and the DUT subsystems. Any 
cabling not required for the operation of the computer 
on which the NIC resides or the Xbar switch, along 
with Myrinet cabling is considered part of the Test 
Controller subsystem. These subsystems are 
described below. Figure 4 illustrates the overall test 
configuration. 

A. Test Controller Subsystem 

The Test Controller hardware is based on the PCI 
Extensions for Instrumentation (PXI) specification. 
The PXI subsystem, shown in Figure 5, resides 
outside the irradiation area and is connected to the 
DUT at the irradiation point by cabling, 
approximately 40 feet long. It consists of the PXI 
components, the PXI Computer <-> DUT System 
cabling and the user interface. 

The PXI components include the PXI chassis, 
which contains an embedded controller (running 
Win98, Labview™ (LV) environment and a custom 
LV application), a signal switch matrix, and two 
digital multimeters (DMMs) in the voltage 


measurement mode. The switch matrix provides two 
(unctions - the multiplexing of analog signals to one 
of the DMMs, and contact closures (pulling signal 
levels to ground). One DMM measures all analog 
values except the value read most frequently or as 
most important to not be delayed by switch settling 
time. The other DMM is dedicated to monitoring that 
value. 
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Figure 4. Block diagram of the test system. 
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Figure 5. Block diagram of the PXI subsystem. 

The PXI Computer’s user interface, network 
connectivity (for data File access) and AC power feed 
are also components of the PXI Computer. An 
extended (via a CAT5 cable based extender from 
Cybex, Inc.) keyboard/monitor/mouse user interface 
provides user control of the PXI computer from the 
user facility (which in this instance is located in the 
hallway outside the regular, restricted access, user 
area). 

Most of the PXI Computer <-> DUT System 
cabling leaves the PXI subsystem from the switch 
matrix. Exceptions are the AC power cable to power 
the DUT System and a serial (RS-232) cable for 
telemetry/command of the DUT System computer 
(telemetry originates within the DUT System 
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computer, commands originate within the PXI 
Controller). 

B. DUT System 

The DUT System consists of the computer in 
which the NIC resides and to which the Xbar is 
connected. It includes components mounted directly 
to the motherboard, components located nearby (e.g. 
disk drives) and connected via cables, and a Cybex 
extended keyboard/monitor/mouse user interface. 

The DUT system computer motherboard resides in 
the test chamber, positioned just below the particle 
beam when the NIC is exposed. The NIC plugs into 
an extension socket which raises it up by 
approximately two inches. The dual l GHz P3 
processors on the motherboard are Flip-Chip Pin Grid 
Array (FC-PGA) form-factor so they lay very low 
and well out of the particle beam, as do the low 
profile (<1”) RAM modules. 

Located nearby (approximately 6 feet) are a 
modified standard PC ATX power supply (PS), a 
floppy and/or hard disk drive, and a Cybex user 
interface extension identical to the one used to extend 
the PXI computer. 

The motherboard is modified to allow power 
cycling and reset via the PXI switch matrix. The 
ATX power supply is modified to allow force power 
shutoffs. The PCI-64 extension board, which the NIC 
plugs into, is modified to sample DUT current via the 
PXI switch matrix and DMMs. 

The NIC is connected via a Myrinet cable to the 
rest of the DUT system, the Xbar switches. (Figure 
6). These are housed in a chassis containing its own 
AC power supply. 

The motherboard is modified to allow connection 
to two controlling signals, both momentary contact 
closures. The motherboard front panel power on/off 
(MotherPonoff) input signal is controlled by the PXI 
switch matrix, as is the motherboard front panel soft 
reset (MotherSR) input signal. 

ATX PS on/off state is normally controlled by a 
constant signal from the motherboard (The ATX SP 
supplies a standby +5V to power such motherboard 
functions). This signal (PS_ON#) is, approximately, a 
latched toggle of the front panel signal, 
MotherPonoff. This motherboard PSON# signal is 
disconnected from the ATX power supply’s PS ON# 
input so that it can be controlled directly from the 
PXI. This additional control is necessary because the 
computer can hang to the extent of not responding to 
the normal on/off commands. The ATX PS AC 
power is extended back to the user facility. 

The DUT Computer runs the Windows-NT™ 
operating system and a software application that 


access Myrinet NIC drivers. Commands from the 
PXI computer are received via an ethemet cable and 
responses are transmitted back via the same link. 



Figure 6. Close-up of Myrinet serial cable ends. 


Currents and voltages, from as many as three 
devices (one NIC and two Xbars), were monitored. 
System cabling was designed to allow four 
current/voltage samples in one subD 15-pin 
connector cable. A cable assembly was added to 
trifurcate three signals to separate locations. 

DUT system signals that are controlled by the PXI 
subsystem, as described above, or by the user from 
the user facility are shown in Table II. 

DUT computer signals that are monitored by the 
PXI or directly by the users in the user facility are 
shown in Table III. 


TABLE III 

DUT System Signals Controlled 
by the PXI System 

Name 

Destination 

Description 

PS ON# 

ATX Power 
supply 

Hold low (0 V) for 
PS on; 

Open = High = Off 

MotherPonoff 

Motherboard 
power switch 
connector 

Pulse low (0 V) to 
toggle power on 
and off 

MotherSR 

Motherboard 
reset switch 
connector 

Pulse low (0 V) to 
initiate reset 

Command 

DUT system 
computer 

CAT-5 cable, 
ethemet, 10/100 
mbps rate. Same 
cable that carries 
Telemetry data. 

Keyboard/ 

mouse 

DUT system 
computer 

PS-2 keyboard 
ports 


C. Test Software 

The DUT software for Myrinet testing was written 
in Microsoft C++ Professional version 6.0. It was 
designed to run in Windows 2000 Professional 
service pack 2. The driver for the Myrinet network 
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adapter was GM 1.1. This driver was downloaded 
from the Myricom website (itttp://www.mynxom/). 

The Network Interface Card (NIC) takes data 
packets from the driver and sends/receives the 
packets through the cables and network switches. The 
receive function of this card rejects data packets 
when errors are detected. The method used for 
detecting errors is a CRC check byte at the end of 
each packet. 


TABLE III 

DUT System Signals Monitored 
by the PXI System 

Name T 

Source 

Description 

on 

NIC 

extender 

card 

Voltage and current 
samples of the NIC 
primary supply. Twisted 
shielded pair (TSP). 

VXbar, 

IXbarl 

First Xbar 
card 

Voltage and current 
samples of the first/only 
Xbar switch card 
supply. TSP. 

VXbar, 

IXbar2 

Second Xbar 
card 

Voltage and current 
samples of the second 
Xbar switch card 
supply, if installed. TSP. 

Telemetry 

DUT system 
computer 

CAT-5 cable, ethemet, 
10/100 mbps rate. Same 
cable that carries 
Command data. 

GUI 

output 

DUT system 
computer 
VGA card. 

Video carrying output to 
the user facility. 


The DUT Software sends packets with an 
incrementing packet # and data which is a function of 
the packet #. If the packet number/16 is odd, then the 
data is a stream of bytes with the value hex 55, 
otherwise, it is a stream of bytes with the value hex 
AA. After each packet is sent, the program waits until 
either a packet is received or approximately 10 
microseconds, whichever comes first. There are two 
physical setups supported. The first setup uses one 
NIC for both sending and receiving. The second 
setup uses two NICs, one for sending and one for 
receiving. 

The DUT is connected through a Transmission 
Control Protocol/Intemet Protocol (TCP/IP) socket to 
the test controller system where the test controller 
system acts as the host and the DUT acts as a client. 
The IP address and port used for the test controller 
connection are hard-coded. When not connected, the 
DUT tries once every 3 seconds to make a 
connection. The DUT sends telemetry information to 


the test controller system and records the same 
telemetry to a file on the DUT hard disk drive. The 
telemetry consists of a stream of 4-byte long integers 
sent LSB first with the following format: 

// The last byte of 4 is a data 
// code. The table below shows the 
// definitions for each code: 

/ / FF timestamp and beam info 

// xx xx yy FF 

// xx xx relative timestamp 

// yy 01 for beam on, 00 for 

// beam off 

// FE Error in data packet 

// xx xx yy FE 

// xx xx location within packet 

// yy data read 

// FD Skipped Packet (s) 

// xx xx xx FD 

/ / xx xx xx Number of skipped 

j f packets 

// FC Skipped Packet (s) (Large/-) 

// 00 00 00 FC 

// xx xx xx xx 

// xx xx xx xx Number of skipped 
I j packets 

j j FB Buffer overflow 


if 

00 

00 

00 

FB 


if 

FA Header 




// 

AA 

AA 

AA 

FA 


// 

aa 

aa 

aa 

aa 


// 

tt 

tt 

tt 

tt 


// 

rr 

rr 

rr 

rr 


if 

rr 

rr 

rr 

rr 


// 

rr 

rr 

rr 

rr 


if 

rr 

rr 

rr 

rr 


if 

ff 

ff 

ff 

ff 


// 

ff 

ff 

ff 

ff 


if 

ff 

ff 

ff 

ff 


if 

ff 

ff 

ff 

f f 


!/ 

ff 

ff 

ff 

ff 


if 

ff 

ff 

ff 

ff 


it 

ff 

ff 

ff 

ff 


if 

ff 

f f 

ff 

ff 


if 

PP 

PP 

PP 

PP 


if 

aa 

aa 

aa 

aa 

Ascii Versi< 

a 

tt 

tt 

tt 

tt 

Time Stamp 

a 

rr 

rr 

rr 

rr , 

, . . Route in 

a 

ff 

ff 

ff 

ff , 

, . . Filename 

a 

PP 

PP 

PP 

PP 

Packet Size 

a 

F9 Reconnect 



If 00 00 00 F9 

// F8 Packet Number 
// xx xx xx xx 

/ / xx xx xx xx Packet Number 

The DUT software utilizes two methods of data 
transfer. The first method of data transfer is the 
standard gm_send_with_callback(). The second is the 
undocumented gm raw send w ith cal lback() . 
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The standard method of transfer uses handshaking 
that ensures that the data is received without any 
detected errors before the send is completed. If any 
errors are detected the data is resent until the data is 
received without detected errors or a timeout of about 
a minute is reached. Before running the DUT 
Software in this mode, the GM utility program 
gm mapper service must be executed. This cannot 
be executed while the DUT software is running. In 
this mode the speed of data transfer can be set using 
gm set speed. This method was not used in this 

testing. 

The undocumented method of transfer (Raw 
Mode) uses no handshaking. If the data is received 
with detected errors it is rejected by the NIC and is 
never seen by the user software. The user software 
detects when a packet is skipped or any errors that 
are not detected by the NIC are received. When 
packets are skipped, the packet number of the packet 
received after the skip and the number of skipped 
packets are recorded. When errors are found within a 
packet, the packet number, the locations within the 
packet and the actual values of the bytes in error are 
recorded. 

The DUT software is controlled through buttons 
and checkboxes on the DUT console. All of these can 
be manipulated through the keyboard and mouse of 
the DUT computer. Some of these can be controlled 
through the TCP/IP connection by the test controller 
system. These are controlled from the test controller 
by sending a one-byte command to the DUT. The 
following can be controlled both by the DUT and by 
the test controller system (The values at the end are 
the values for the command byte from the test 
controller system): 

1 . The button called "Run" is pressed to start 
logging data and to start sending data 
when "Loopback" is checked. (1-Run) 

2. The button called "End" is pressed to 
terminate logging and to stop sending 
data when "Loopback" is checked. (2- 
End) 

3. The checkbox on the console called 
"Beam" is checked when the beam is on. 
(3-Check; 4-Uncheck) 

4 . The checkbox on the console called 
"Loopback" is checked when using one 
NIC and not checked when using two. (5- 
Check; 6 -Uncheck) 

5. The checkbox "Raw Mode" is checked 
when using the raw data transfer mode 
and not checked when using the standard 
transfer mode. (7-Check; 8 -Uncheck) 


The following can only be controlled from the 

>UT: . , , 

[ -phe "Route" button and edit box are used 

to set the route that the data takes through 
the switches when using the raw data 
transfer mode. 

2. The "Packet Size" edit box is used to set 
the data packet size (4088 is the default, 
4096 is the max). 

3 The "Directory" edit box is used to select 
the directory into which the telemetry 
files are stored. 

4 The "Suffix" edit box is used to select text 
which is appended to "Run*", where 

is the run number, when forming the file 
name for the telemetry file. The run 
number is incremented each time the 
"End" button is pressed. 

D. Test Methodology 

In this simple test, the main objectives were to 
observe what effects would be induced by proton 
irradiation, with specific concern to latchup 
sensitivity of any parts. Therefore, to achieve these 
goals, the main devices of both the crossbar switch 
and the network interface card (NIC) were place in 
the proton beam. 

During their exposure, the DUT computer was 
running software that was generating data to be 
passed along the network and watching for the arrival 
of these packets of information. While no direct 
evidence of upsets was possible, as explained 
previously, if data within the packet was corrupted, 
the Myrinet hardware would drop the packet. The 
missing packet would then be noticed by the DUT 
software and recorded. This is the main type of error 
observed. During exposure of the NIC it was also 
possible to induce errors in the data stream once the 
NIC accepted a package as valid. The methodology 
and software were also in place to observe these 
types of errors as well. 

The methodology flow was to place the device to 
be exposed in the beam, start the DUT and PXI 
software systems, turn the proton beam on, and, 
finally, observe the effects. The proton beam 
remained on until either a preset amount of fluence 
was achieved or a functional interrupt or latchup was 
observed. Initially, the preset fluence was set to a 
smaller amount due to the uncertainty in the total 
dose response of any of the devices. As the testing 
proceeded and the devices appeared to withstand the 
dose sufficiently well, preset fluences were set to 
levels such that there was typically a functional 
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interrupt prior to the preset fluence level being 
reached. 

For the errors that were observed, the test software 
recorded all pertinent information about the errors, 
including the manner in which they were received 
(e.g., did a single packet get dropped or were a 
sequence of packets lost in a very short time span). 

For functional interrupts, as much information, that 
could be gleaned from the test system, was recorded. 

In some instances it was simply that the DUT 
computer rebooted while in others it was detailed 
information about which switch in the crossbar 
devices induced the interrupt. If any latchup currents 
were observed, the device, the peak current seen at 
the device, and the functionality after the latch would 
have been recorded. 

IV. RESULTS 

A. Network Interface Card 

1) Single Event Latchup 

For the simple test being performed on this system, 
the NIC current was monitored for the entire board. 
Therefore, determination of a latchup event in an 
individual component would have to generate 
sufficient current to be observable above nominal 
NIC current. For all five components exposed to the 
proton beam on the NIC, no high NIC currents or 
destructive events were observed. There were events 
on all five devices that led to functional interrupts (to 
be discussed next). These events could possibly be 
produced via a high current condition in the 
respective part, as a power cycle of the DUT 
computer was required to reset after the interrupt. 
However, since no events were destructive, it is 
impossible to say that latchup did or did not play any 
role in these events. 

2) Single Event Functional Interrupts (SEFl) 
When any of the five devices were exposed to the 
proton beam, the DUT computer system would 
experience a SEFI event at some point. This could be 
seen as the DUT computer either freezing or 
initiating a self-reboot. In all instances observed for 
all five devices, a power cycle of the DUT computer 
that housed the NIC was required to regain 
functionality. The SEFl cross-sections measured for 
the five devices are shown in the last column of 
Table IV. 

3) Single Event Upsets 

As discussed in the software section, missed 
packets are the normal mechanism for errors to 
display themselves for the test setup used here. For 


all but the Samsung device, this is the upset 
mechanism that was observed. For the Samsung 
SRAM part, the second upset possibility arose. These 
are errors that are received that are not detected by 
the NIC. In other words, data was correctly received 
by the NIC, processed by the Lanai9 processor an 
stored into the Samsung SRAM. While in this stored 
location, it is altered and that difference is detected. 
The upset cross sections measured for the five 
devices are shown in the third column of Table IV. 


_ TABLE IV 

Results Summary Table 


Part 

Accumulated 
Dose (krad) 

Upset 

Cross 

Section 

(cm 2 ) 

SEFl 

Cross 

Section 

(cm 2 ) 

Lanai9 

59.2 1 

6.8 1 E- 12 

1.14E-1 1 

SerDeSer 

53.1 

1.52E-1 1 

5.07E-12 

pcidmaT 1 

45.7 

2.94E-12 

1.18E-1 1 

Vitesse 

50.7 

4.03 E- 10 

7.95E-12 

Samsung 

9.1 

8.84E-1 1 

7.37E-1 1 


It should be noted that two of these Samsung 
SRAM parts are exposed during the testing (one on 
each side of the board). It is not clear from the 
Myricom documentation how much of both of these 
parts are used and if their usage is equal. Therefore, 
the cross section is left as a total cross section for 
both parts (not per bit or per device). 

4) Total Dose 

While total dose testing was not explicitly included 
in this testing, proton dose is accumulated over the 
course of the test. No parametric measurements of the 
devices are feasible with this test setup. However, no 
functional loss or functional performance degradation 
was observed throughout this entire test. Therefore, it 
can be stated that the devices are total dose 
functionally survivable to at least the maximum 
proton dose during this test. These dose levels for the 
five devices on the NIC are shown in the second 
column of Table IV. 

B. Crossbar switch Device 
1) Single Event Latchup 

For the test being performed on this system, the 
crossbar switch current was monitored. For both 
crossbar switch devices exposed to the proton beam, 
no high currents or destructive events were observed. 
There were events that led to functional interrupts. 
These events did require a power cycle of the 
crossbar power supply to reset after the interrupt. 
However, since no events were destructive and no 
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high currents were observed for the switches, it is 
possible to say that latchup did not likely play any 
role in these events. 

2) Single Event Functional Interrupts (SEFI) 
Figure 7 and Figure 8 show the per-switch SEFI 
cross-section for the crossbar switch devices tested. 
In Figure 7, it is assumed that all switches have the 
same sensitivity whether they are on the in Xbar 
ffontplane (FP) or backplane (BP), or on either Xbar 
#1 or Wl. This cross section is plotted as a function ot 
the number of switches active during that test (there 
are different percentages of the switch locations for 
these four cases). The squares and error bars are the 
average overall cross section, assuming all switches 
are the same, and the one-sigma standard deviation. 
The triangles are the cross sections within each of the 
four cases. The four data points almost lie within one 
sigma. 
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Figure 7. SEFI Cross section as a function of the 
number of switches. 

Figure 8 looks at the same data set but with the 
thought that the SEFI rate could be different between 
frontplane and backplane switches. The four cases 
shown here are the two ffontplanes of the two Xbars, 
the backplane switches, independent of which Xbar 
houses them, and the overall cross section (the same 
as the squares of Figure 7). While all of the data 
points lie within the one-sigma error bars of the 
overall cross section, there does appear to be a 
difference between the frontplane and backplane 
switches. 

3) Single Event Upsets - Non-SEFI 
Single Event Upsets (SEU), for the crossbar switch 
devices, are only evident as dropped packages. Data 
was collected to include the number of dropped 
packages, whether they arrived as a single dropped 
package or in a rapid sequence of dropped packages. 
This data was collected for four different switch 
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quantities, that also had varying quantities in the 
frontplane and backplane. 

Figure 9 shows the per-switch cross section as a 
function of the number of switches in the test 
configuration. It shows data for both single package 
loss and for multiple package loss. It is evident that 
the multiple package loss appears to be within 
approximately one sigma of an average value for the 
multiple events. The same cannot be said for single 
package loss, as the two higher switch count cases 
(those with backplane switches in the test 
configuration) have substantially higher cross 
sections. 
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Figure 8. SEFI cross-section as a function of the 
location of the affected switch. 



Figure 9. SEU cross section as a function of the 
number of switches for both single dropped packages 
as well as multiple dropped packages. 


The same data as shown in Figure 9, can be viewed 
in another way by looking at the total cross section 
(both single and multiple package losses and not per- 
switch). This data is shown in Figure 10. 

The two cases, with the lowest number of total 
switches (the cases with only frontplane switches), 
have a cross section that is nearly an order of 
magnitude lower than the two cases with higher 
number of switches. Both of these higher switch 
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count cases have the full sixteen switches from the 
backplane incorporated in the path for the data 
packages. The highest switch count case does have a 
slightly higher cross section than the next lower case 
as it does contain nine additional frontplane switches 
(again, these cross sections are not per-switch). 

This SEU data appears to imply that having the 
backplane switches in the data path will substantially 
increase the data package loss as compared to 
running without backplane switches. It is possible 
that there are physical differences between backplane 
and frontplane switches in dealing with packages that 
is not immediately evident from the Myricom 
documentation. 



Figure 10. Total SEU cross section as a function of 
the number of switches with details of switch 
locations. 


4) Total Dose 

As with the NIC, total dose testing was not 
explicitly included in the testing. However, proton 
dose is again accumulated over the course of the test 
on the crossbar switch devices (Xbar). No parametric 
measurements of the devices are feasible with this 
test setup. However, no functional loss or functional 
performance degradation was observed throughout 
this entire test. Therefore, it can be stated that the 
devices are total dose functionally survivable to at 
least the maximum proton dose during this test. 
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For this test setup, however, some amount of 
uncertainty exists for the dose levels of Xbar #2. This 
is because the proton beam passes through Xbar #1 
and then the board for Xbar tt\ before impinging on 
Xbar #2. While there is an unknown amount of 
material between the two switches, it does not appear 
to be substantial and it is assumed that the 
incremental doses on Xbar #1 are the same foe Xbar 
#2 when it is in place (Xbar #2 is only used when 
more than seven switches are used in the routing). 
These dose levels for the two crossbar switch devices 
tested are 400 krads and 285 krads, for Xbar #1 and 
#2 respectively. 

V. Summary 

The Myricom Myrinet network system was 
evaluated for proton single event effects response. No 
indication of latchup was observed. Functional 
interrupts and data loss upsets were observed and 
their cross sections determined. 

Total dose numbers seen during the testing indicate 
a good tolerance to total dose. Single event upset and 
functional interrupt rates were substantial and all 
interrupts required a power cycle to regain 
functionality. Further testing of this technology needs 
access to the bit level information to more accurately 
assess the single event upset sensitivity and the 
possibilities of any mitigation techniques. 
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